synthetic data
Appendix
A.4 EstimatingparameterswhenY(t)isunavailable New parameter estimators that leverage only the available data need to be derived whenY(t) is unavailable. The derivation goes as follows: first, we eliminateY(t) from the model equations. The squared error of the estimated parameters are shown in Figure 1. First, we estimated the parameters separately for each individual. Second, we performed statistical analysis to find associations between the estimated parameters and the demographic variables.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (4 more...)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Tax (1.00)
- (4 more...)
- North America > United States > Illinois (0.05)
- Europe > France (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > Middle East > Jordan (0.04)
From Collapse to Improvement: Statistical Perspectives on the Evolutionary Dynamics of Iterative Training on Contaminated Sources
Bakshi, Soham, Chakraborty, Sunrit
The problem of model collapse has presented new challenges in iterative training of generative models, where such training with synthetic data leads to an overall degradation of performance. This paper looks at the problem from a statistical viewpoint, illustrating that one can actually hope for improvement when models are trained on data contaminated with synthetic samples, as long as there is some amount of fresh information from the true target distribution. In particular, we consider iterative training on samples sourced from a mixture of the true target and synthetic distributions. We analyze the entire iterative evolution in a next-token prediction language model, capturing how the interplay between the mixture weights and the sample size controls the overall long-term performance. With non-trivial mixture weight of the true distribution, even if it decays over time, simply training the model in a contamination-agnostic manner with appropriate sample sizes can avoid collapse and even recover the true target distribution under certain conditions. Simulation studies support our findings and also show that such behavior is more general for other classes of models.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology > Security & Privacy (0.67)
- Asia > Middle East > Israel (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Offline Behavior Distillation
Inspired by dataset distillation (DD) [Wang et al., 2018, Zhao et al., (Corollary 1). Extensive experiments on nine datasets of D4RL benchmark [Fu et al., 2020] with multiple environments and data qualities illustrate that our Av-PBC remarkably promotes the OBD performance, Moreover, Av-PBC has a significant convergence speed and requires only a quarter of distillation steps compared to DBC and PBC.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (2 more...)